AD-A197  436 


•  >  one  iiii.  Lutj 

ARI  Research  Note  88-43 


Direct  and  Indirect  Scaling  of 
Membership  Functions  of  Probability 

Phrases 

Amnon  Rapoport,  Thomas  S.  Wallsten, 
and  James  A.  Cox 

University  of  North  Carolina 


for 

Contracting  Officer’s  Representative 
Michael  Drillings 

ARI  Scientific  Coordination  Office,  London 
Milton  S.  Katz,  Chief 

Basic  Research  Laboratory 
Michael  Kaplan,  Director 


U.  S.  Army 

Research  Institute  for  the  Behavioral  and  Social  Sciences 

June  1988 


Approved  for  the  public  release;  distribution  unlimited. 


U.  3.  ARMY  RESEARCH  INSTITUTE 

FOR  THE  BEHAVIORAL  AND  SOCIAL  SCIENCES 


A  Field  Operating  Agency  under  the  Jurisdiction  of  the 
Deputy  Chief  of  Staff  for  Personnel 


EDGAR  M.  JOHNSON 
Technical  Director 


L.  NEALE  COSBY 
Colonel,  IN 
Commander 


Research  accomplished  under  contract 
for  the  Department  of  the  Army 

Vreuls  Research  Corp. 

Technical  review  by 
Dan  Ragland 


This  report,  a*  tubmitted  by  the  contractor,  hat  been  cleared  for  release  to  Detente  L  t  r^  h«^hTn  to  DTlC 
(OTIC)  to  comply  with  regulatory  requirement!.  It  hat  been  glean  no  primary  d,*tr,but,on  other  then  to  OTIC 
and  will  be  available  only  through  OTIC  or  other  reference  *«rvie«  *uch  at  the  National 
Service  (NTIS).  The  viewt,  opinion*,  and/or  finding*  contained  in  thi*  report  are  thote  of  the  author and 
should* not'bc  conttrueJ  M  arTofficia!  Oepaitment  of  the  Army  potltion.  policy,  or  decision,  unit**  to  dengnated 
by  other  official  documentation.  _  _ _____ 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PAGE  (Whan  Data  entered) 


REPORT  DOCUMENTATION  PAGE 


I.  REPORT  NUMBER 


ARI  Research  Note  88-43 


4.  TITLE  (Mid  Subtitle) 

Direct  and  Indirect  Scaling  of  Membership 
Functions  of  Probability  Phrases 


7.  AUTHORS  A) 

Amnon  Rapoport,  Thomas  S.  Wallsten, 
and  James  A.  Cox 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


recipient's  catalog  number 


S.  TYPE  OF  REPORT  ft  PERIOD  COVERED 

Interim  Report 
October  84  -  October  85 


S.  PERFORMING  ORG.  REPORT  NUMBER 

Report  No.  174 


ft.  CONTRACT  OR  GRANT  NUMBERS 

MDA903-83-K-0347 


9.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

L.L.  Thurstone  Psychometric  Laboratory, 
University  of  North  Carolina, 

Chapel  Hill,  NC  27514 


II.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

U.S.  Army  Research  Institute  for  the  Behavioral 
and  Social  Sciences,  5001  Eisenhower  Avenue, 
Alexandria,  VA  22333-5600 


10.  PROGRAM  ELEMENT.  PROJECT,  TASK 
AREA  ft  WORK  UNIT  NUMBERS 


2Q161102B74F 


12.  REPORT  DATE 

June  1988 


13.  NUMBER  OF  PAGES 


i 


MONITORING  AGENCY  NAME  ft  AOORESS (II  dlllerent  from  Controlling  Oltleo)  IS.  SECURITY  CLASS,  (o I  thle  import) 

Unclassified 


16.  DISTRIBUTION  STATEMENT  (ol  thle  Rmport) 

Approved  for  public  release,  distribution  unlimited. 


17.  DISTRIBUTION  STATEMENT  (ol  the  abotrmet  an  farad  In  Block  20,  II  different  from  Rmport) 


IS.  SUPPLEMENTARY  NOTES 


Michael  Drillings,  contracting  officer's  representative 


19.  KEY  WORDS  (Contlnum  on  rararaa  elde  It  neceeeery  and  Identity  by  block  number) 

Probability  Decisions  Decision  Making 
Scaling  Procedures  Psychometrics 

Cognitive  Science  Fuzzy  Sets 


20.  ABSTRACT 


mtd  Identity  by  block  ■ 


A  crucial  issue  in  empirical  measurement  of  membership  functions  is  whether 
the  degree  of  fuzziness  is  invariant  under  different  scaling  procedures.  In 
this  paper  a  direct  and  an  indirect  procedure,  (magnitude  estimation  and 
graded  pair- compari son) ,  are  compared  in  the  context  of  establishing  member¬ 
ship  functions  for  probability  phrases  such  as  "probable",  "rather  likely", 
"very  unlikely",  and  so  forth.  Analyses  at  the  level  of  individual  respon¬ 
dents  indicate  that  (a)  membership  functions  are  stable  over  time,  (b) 


FORM 
I  JAM  79 


cot -no*  OF  1  MOV  «•  IS  OBSOLETE 


_ UNCLASSIFIED 

SECURITY  CLASSIFICATION  OF  THIS  PACE 


ffflnn  Data  Bn  frit) 


_ UNCLAS S IF IED 

SECURITY  CLASSIFICATION  OF  This  RAGEfWIii  Dmtm  Enlmr^f) 


ARI  RESEARCH  NOTE  88-43 


20.  Abstract  (continued) 

functions  for  each  phrase  differ  substantially  over  people,  (c)  the  two 
procedures  yield  similarly  shaped  functions  for  a  given  person-phrase 
combination,  (d)  the  functions  from  the  two  procedures  differ  systemati¬ 
cally,  in  that  those  obtained  directly  dominate,  or  indicate  greater 
fuzziness  than  do  those  obtained  indirectly,  and  (e)  where  the  two  differ 
the  indirectly  obtained  function  may  be  the  more  accurate  one.  A  secondary 
purpose  of  the  paper  is  to  evaluate  the  effects  of  the  modifiers  "very" 
and  "rather".  "Very"  has  a  general  intensifying  effect  that  is  described 
by  ZadeVs  (1972)  concentration  model  for  seven  subjects  and  by  a  shift 
model  for  no  one.  The  effects  of  "rather"  are  unsystematic  and  not 
described  by  any  available  model. 


Accession  For 

NT IS  GRA4I 
DTIC  TAB 
Unannounced 
Justification 


By - - 

Distribution/ 


Availability  Codes 


Blst 


Avail  3nd/or 
I  Special 


i 

( 


i  ii 

» 

> 


_ unclassified _ 

SECURITY  CL  ASSI  FI  CATION  OE  THIS  RAGEf When  D»tm  Enter* t) 


Abstract 


A  crucial  issue  in  the  empirical  measurement  of  membership 
functions  is  whether  the  degree  of  fuzziness  is  invariant  under 
different  scaling  procedures.  In  this  paper  a  direct  and  an 
indirect  procedure,  magnitude  estimation  and  graded  pair- 
comparison,  are  compared  in  the  context  of  establishing 
membership  functions  for  probability  phrases  such  as  poohable., 
alhe-c. jU-i.lce.lx , ^ualikalx ,  and  so  forth.  Analyses  at  thexp 
level  of  individual  respondents  indicate  that  (a>  membership 
functions  are  stable  over  time,  (b)  functions  for  each  phrase 
differ  substantially  over  people,  <c>  the  two  procedures  yield 
similarly  shaped  functions  for  a  given  person-phrase  combination 
<d>  the  functions  from  the  two  procedures  differ  systemat i cal  1 y , 
in  that  those  obtained  directly  dominate,  or  indicate  greater 
fuzziness  than  do  those  obtained  indirectly,  and  (e>  where  the 
two  differ  the  indirectly  obtained  function  may  be  the  more 
accurate  one.  A  secondary  purpose  of  the  paper  is  to  evaluate 
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The  most  'fundamental  concept  in  the  theory  of  fuzzy  sets  is 


that  of  a  fuzzy  subset  A  of  a  universe  of  discourse  U,  with  A 
characterized  by  a  membership  function  ^<x>  that  associates 
with  each  point  x  §  U  its  "grade  of  membership"  in  A.  Usually, 
but  not  necessarily,  jg  assumed  to  range  in  the  interval 

C0,13.  The  numbers  0  and  1  correspond  then,  as  assumed  in  this 
paper,  to  non-membersh i p  and  full  membership,  respectively. 

The  primary  goal  of  the  present  paper  is  to  compare 
membership  functions  constructed  by  two  alternative  scaling 
procedures,  and  the  secondary  goal  is  to  evaluate  the  effects  of 
certain  modifiers  on  the  membership  functions.  However,  it  is 
necessary  first  to  address  four  issues  and  their  experimental 
implications — the  effects  of  context,  the  subjective  nature  of 
membership  functions,  variability  in  membership  judgments,  and 
the  problem  of  scale  type.  Ule  discuss  the  first  two  issues  only 
briefly  but  the  latter  two  in  more  detail. 

It  is  generally  recognized  that  membership  functions  depend, 
at  least  to  some  degree,  on  the  context  in  which  they  are 
measured.  For  example,  the  grade  of  membership  of  a  60  year  old 
woman  in  the  class  of  ol d  women  may  vary  from  country  to  country 
depending  on  life  expectancy.  And  the  grade  of  membership  of  an 
event  whose  probabi 1 i ty  of  occurrence  is  0.2  in  the  class  of 
uni i kel y  events  probably  depends  on  whether  the  event  gives  rise 
to  favorable  or  unfavorable  outcomes.  Without  an  exact 
specification  of  the  context  and  an  experimental  investigation  of 
the  effects  of  context  on  vagueness  and  fuzziness,  comparisons  of 
membership  functions  constructed  under  different  experimental 
setups  may  be  grossly  misleading.  Context  is  held  fixed 


throughout  the  present  study. 


The  second  issue  stems  -from  the  -fact  that  grades  of 
membership  are  generally  subjective.  An  important  experimental 
’  «r  i  ;  r  a  *  i  m  r, *  this  subjectivity,  which  has  not  always  been 
-followed,  is  that  within-  rather  than  be  tween-person  designs 
should  be  implemented  because  averaging  over  membership  -functions 
from  different  individuals,  even  when  the  context  is  fixed,  is 
meaningless. 

But  even  if  these  two  relatively  simple  issues  are  properly 
addressed,  a  question  arises  as  to  whether  the  value  of  uA(x>  can 
be  determined  accurately.  The  obvious  paradoxical  conclusion 
that  the  nature  of  fuzziness  does  not  permit  precise  measurement 
of  grades  of  membership  led  to  the  solution  of  creating  a  "type- 
2“  fuzzy  set  by  laying  a  second  level  of  membership  over  the 
original  "type-1*  membership  function,  then  creating  a  "type-3" 
fuzzy  set  by  laying  a  third  level  of  membership  over  the  "type-2" 
membership,  and  so  on.  As  noted  by  Norwich  and  Turksen,  "The 
resulting  infinite  regress  led  to  discouragement  about  whether  a 
membership  function  could  ever  be  meaningfully  constructed" 

< 1 982c ,  p .  68) . 

The  resolution  of  this  paradox  proposed  by  Norwich  and 
Turksen  is  similar  in  spirit  to  the  philosophy  underlying  the 
experimental  testing  of  algebraic  models  in  decision  making  and 
other  areas  of  cognitive  psychology.  It  is  based  on  the 
recognition  that  the  infinite  regress  model  outlined  above  treats 
the  grade  of  membership  on  every  level  as  deterministic,  whereas 


is  impossible.  As  argued  by  Norwich  and  Turksen  < 1982a)  and 
demonstrated  exper imen tall y  by  Norwich  and  Turksen  <1982b>  and 
Wallsten  Budescu,  Rapoport,  Zwick,  and  Forsyth  <1985),  the 
repeated  elicitation  of  grades  of  membership  for  a  given  subject 
in  a  well  defined  context  by  some  psychophysical  scaling 
procedure  (e.g.,  magnitude  estimation,  paired-comparison)  yields 
variability  in  measurement.  Norwich  and  Turksen  claim  that  this 
variability  "embodies  all  the  uncertainty  or  imprecision  in  this 
value  <and,  equivalently,  all  the  information  about  this  value) 
which  exists  in  the  subject's  mind"  <1982c,  p.  69).  One  may, 
therefore,  use  the  mean  of  a  set  of  non-determi n i st i c  numerical 
responses  as  the  grade  of  membership  of  a  type-1  fuzzy  set  with 
no  need  to  amass  higher  levels  of  deterministic  membership  to 
model  the  subject's  fuzziness. 

The  logic  used  by  Norwich  and  Turksen  to  resolve  the 
infinite  regress  paradox  may  be  used  against  the  form  of  their 
argument  leading  to  their  claim  that  the  membership  function  is 
measurable  at  most  on  an  interval  scale.  Norwich  and  Turksen 
have  correctly  contended  that  since  a  subject  is  not  precisely 
sure  of  the  meaning  of  a  subjective  concept,  it  is  contradictory 
to  the  notion  of  fuzziness  to  assert  that  this  concept  partitions 
the  universe  of  discourse  precisely  into  three  regions  where 

>*A<*>  —  0,  uA(x>  =  1,  and  0  <  u^(x)  <  1,  respectively,  the 
boundaries  of  which  can  be  precisely  determined  (to  within  the 
limits  of  physical  d i scr imi nab i 1 i ty)  (Norwich  &  Turksen,  1982a, 
1984).  Norwich  and  Turksen  base  this  claim  on  the  observation 
that  the  condition  for  a  natural  zero  for  a  membership  structure 
will  most  likely  not  be  met  by  a  subject.  Their  claim,  however, 


presupposes  deterministic  responses,  whereas  in  practice  the 
boundaries  of  the  three  regions  determined  by  the  membership 
function  are  determined  statistically,  just  as  are  absolute 
thresholds  and  difference  thresholds  in  psychophys i cal  scaling. 
Precisely  as  one  may  employ  the  mean  of  a  set  of  non¬ 
determini  st i c  responses  as  the  grade  of  membership  of  a 
particular  element,  one  may  use  the  mean  of  non-determi n i st i c 
responses  to  determine  the  region's  boundaries. 

Our  argument  above  illustrates  the  point  that  questions 
regarding  response  variability  are  logically  distinct  from  those 
regarding  scale  type.  With  respect  to  this  final  issue,  Gougen 
(1969)  has  stated  that  the  membership  function  can  be  no  stronger 
than  an  ordinal  scale,  Norwich  and  Turksen  < 1982a,  1984)  have 
claimed  interval  scale  strength  for  the  membership  function, 

Saaty  <1974)  has  espoused  the  ratio  scale,  whereas  Thole, 
Zimmerman,  and  Zysno  <1979)  have  asserted  that  grades  of 
membership  are  measurable  on  an  absolute  scale.  To  resolve  this 
controversy,  it  must  be  recognized  that  scale  type  is  not 
arbitrary,  but  rather  it  minimally  depends  on  properties  of  the 
paticular  measurement  procedure  utilized.  Additional 
assumptions,  ideally  ones  that  are  testable,  then  may  be  evoked 
on  theoretical  or  pragmatic  grounds  to  yield  stronger  scales. 
Experimental  Measurement  of  Membership  Functions 

Although  much  has  been  written  about  the  measurement  of 
fuzziness  and  vagueness,  empirical  work  has  been  relatively 
sparse  (e.g.,  Hersh  6c  Caramazza,  1976;  Hersh,  Caramazza,  & 
Brownell,  1979;  Kuz'min,  1981;  MacVi  car-tdhel  an ,  1978;  Norwich  & 


Turksen,  1982b,  1984;  Oden,  1977a,  1977b;  Thole,  Zimmerman,  & 

2>sno,  1979;  Ualisten  et  al . ,  1985;  Zysno,  1981).  Moreover,  with 

the  exception  of  an  experiment  by  Norwich  and  Turksen  (1982b, 

1984)  the  empirical  studies  have  each  employed  only  a  single 

scaling  procedure  to  construct  membership  -functions.  They  have 

not  determined  whether  the  resulting  membership  -functions  are 

invariant  under  various  scaling  procedures  and  if  not,  what  are 

the  relationships  are  between  membership  functions  obtained  from 

the  same  subject  by  different  procedures.  The  investigation  of 

the  invariance  of  fuzziness  under  different  scaling  procedures  is 

as  important  to  fuzzy  set  theory  as  is  the  study  of  invariance  of 

other  subjective  perceptions  to  psychophysical  theories.  For 

example,  discussing  the  theory  of  signal  detection  and  commenting 

on  three  different  procedures,  the  yes-no,  rating,  and  forced- 

choice  tasks,  for  measuring  the  detection  of  weak  signals  in 

noise,  Green  and  Swets  <1 966)  wrote: 

"Comparing  the  results  obtained  from  these  different 
procedures  is  extremely  important  because  such  comparisons 
provide  the  major  test  of  the  validity  of  the  decision- 
theory  analysis.  If  the  analysis  is  verified,  it  yields 
measures  of  the  detectability  of  the  signal  that  are 
independent  of  the  procedure  used  to  estimate  these 
measures.  Therefore,  this  analysis  holds  forth  the 
possibility  of  psychophys i cal  relations  which  are 
independent  of  procedure,  a  goal  more  often  hoped  for  than 
ach i eved"  (p .  31 ) . 

Once  ‘fuzzy  set  theory"  is  substituted  for  the  " dec i s i on-theory 
analysis"  and  "fuzziness  of  the  concept"  for  "detectability  of 
the  signal,"  this  statement  holds  true  also  for  the  measurement 
of  imprecise  concepts  by  fuzzy  set  theory. 

The  exception  mentioned  above  is  a  study  by  Norwich  and 
Turksen  < 1982b),  who  employed  two  scaling  procedures,  direct 


m 
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rating  (also  known  as  magnitude  estimation)  and  reverse  rating. 
For  the  direct  rating  procedure,  the  subject  was  presented  one  at 
a  time  with  a  life-sized  wooden  male  figure  of  adjustable  height 
or  a  cardhouse  with  adjustable  dimensions.  The  subject  was  then 
asked  to  rate  the  tallness  of  the  man  or  the  aesthetic 
pleasantness  of  the  house  by  adjusting  the  position  of  a  pointer 
on  a  horizontal  line  segment.  The  left  end-point  of  the  segment 
corresponded  to  all  men/houses  which  he  or  she  felt  were 
definitely  not  tall/pleasing,  the  right  end-point  to  all  those 
that  were  definitely  tal 1 /p 1 eas i ng ,  and  the  line  segment  between 
these  two  end-points  was  interpreted  to  represent  how  strongly 
the  subject  agreed  that  the  man/house  was  tal 1 /p 1 eas i ng .  For  the 
reverse  rating  procedure,  the  pointer  was  randomly  set  to  some 
position  in  the  segment  and  the  subject  adjusted  the 
height/dimensions  of  the  man/house  to  correspond  to  the  degree  of 
membership  indicated  by  the  position  of  the  pointer.  In  all 
cases,  any  particular  stimulus  was  direct  or  reverse  rated  a 
minimum  of  nine  times  by  the  subject  and  the  average  was  then 
used  to  assess  the  membership  functions  of  "tall"  and 
“aesthetically  pleasing.”  Norwich  and  Turksen  report  that  "It 
has  been  found  for  all  subjects  that  direct  and  reverse  ratings 
appear  to  yield  the  same  membership  function  (this  conclusion 
will  be  checked  by  statistical  hypothesis  testing)  and  that  the 
axioms  of  the  algebraic-difference  structure  are  obeyed"  (1984, 
p.  12).  Detailed  comparisons  of  the  outcomes  of  the  two  scaling 
procedures  or  of  the  axiom  tests  have  not  been  provided. 
Furthermore,  the  algebraic-difference  axioms  refer  to  a 
comparison  procedure,  and  therefore  their  satisfaction  does  not 
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validate  the  very  different  direct  or  reverse  rating  procedures. 
Major  Goals  of  the  Present  Study 

Both  the  direct  rating  and  reverse  rating  are  examples  of 
direct  psychophys i cal  scaling  techniques,  which  have  yielded 

stable  relations  within  psycholoov  between  physical 
scale  values  and  estimates  of  psychological  magnitudes  over  a 
wide  variety  of  sensory  continue.  However,  as  also  noted  by 
Thole  et  al .  (1979),  psychological  scales  developed  with  direct 
methods  may  be  distorted  by  a  number  of  response  biases.  Because 
of  the  difficulties  that  have  plagued  many  attempts  to  measure 
psychological  magnitudes  directly,  some  psychophysi c i sts ,  notably 
Fechner  (1860)  and  Thurstone  (1927),  have  turned  to 
di scr imi nabi 1 i ty  or  conf usabi 1 i ty  between  stimuli  as  a  method  for 
inferring  psychological  magnitudes  indirectly  (Shepard,  1981). 
Thurstone's  model,  which  was  originally  formulated  as  a 
psychophysical  theory,  and  signal  detection  theory  (e.g.,  Green  & 
Swets,  1966),  which  emphasizes  both  judgmental  as  well  as  sensory 
determinants,  are  examples  of  indirect  psychophysi cal  analyses. 
Recognizing  some  of  the  major  advantages  of  indirect  scaling 
procedures,  Thole  et  al .  remarked  that  "As  yet,  however,  no 
practical  indirect  technique,  the  result  of  which  is  more  than  an 
interval  scale,  is  available"  <1979,  p.  170).  However,  this 
situation  was  addressed  at  least  in  part,  by  the  work  of 
Ulallsten  et  al  .  (1985),  which  utilized  a  graded  pair-comparison 
procedure  in  two  experiments  to  test  the  algebraic-difference 
axioms  and  to  measure  membership  on  an  interval  scale.  The 
functions  had  i n terpre tabl e  shapes  and  predicted  an  independent 


set  of  judgments. 


A  major  purpose  of  the  present  study  is  to  compare  two 
scaling  procedures  for  measuring  membership  functions,  a  direct 
rating  and  the  indirect  graded  pair-comparison  procedure 
ueveiopeo  Dy  waiisten  at  al .  (1985).  Ine  theoretical 
significance  of  investigating  the  invariance  of  fuzziness — or 
lack  of  it — under  different  scaling  procedures  has  already  been 
mentioned  above.  But  there  is  also  a  practical  reason  for  the 
comparison.  Indirect  scaling  procedures  are  notoriously  more 
laborious  than  direct  ones  (Thole  et  al . ,  1979),  requiring  a 
multiple  of  n(n-l>/2  rather  than  of  n  judgments.  Our  comparison 
of  the  two  scaling  procedures  should  show  whether  the  membership 
function  remains  invariant  when  the  less  taxing  procedure  is 
used . 

The  imprecise  concepts  whose  membership  functions  were  to  be 
measured  consisted  of  probability  phrases  such  as  probabl e . 
uni i  K  e  1  >  .  oossi bl e .  and  so  forth,  with  which  most  people, 
including  experts  in  medical  diagnosis,  military  intelligence, 
and  weather  forecasting,  generally  prefer  communicating  their 
uncertain  opinions.  Waiisten  et  al .  (1985)  also  utilized 
probability  phrases.  A  second  purpose  of  the  present  study  is  to 
examine  the  effects  of  modifiers  like  very  and  rather  on  the 
shape  and  interpretation  of  membership  functions  for  probability 
phrases. 

Method 

Four  groups  of  five  subjects  each  were  employed,  each 


responding  to  a  different  set  of  five  probability  phrases.  Table 
1  shows  the  probability  phrases  for  each  group.  Altogether  the 
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membership  functions  of  13  probability  phrases  Mere  scaled.  The 
phrase  pntsi hi*  Mas  presented  to  all  20  subjects;  the  four 


phrases  pcnhahi  e  ,  .unprnhahl e ,  i iktly ,  and  unl.i.k.ely  Mere  each 
presented  to  10  subjects;  and  the  same  four  phrases  each  modified 
by  ca.lb.ec.  or  uacx  Mere  each  presented  to  5  subjects.  In  an 
attempt  to  minimize  context  effects,  each  group  Mas  presented 
Mi th  the  same  mix  of  ahigha  and  'low'  phrases.  Each  had  a  high 
probability  term  (.probable  or  l  i  k»i  v>  ,  *  1  om  probability  term 
<  i mp rnhahi »  or  uo likely ) ,  as  Me  11  as  one  of  those  terms  modified 
by  rather  and  the  other  by  uacy,  plus  a  single  "neutral"  phrase 
< pr.»t i hi # l  .  Antonyms  were  not  presented  in  the  same  group. 


Table  1 

Probability  Phrases  Used  in  the  Experiment 


1 

Group 

2  3 

4 

Very  probable 

Probable 

Very  likely 

Rather  likely 

Probable 

Rather  Probable 

Likely 

Likely 

Possible 

Possible 

Possible 

Possible 

Unlikely 

Unlikely 

Improbable 

Improbable 

Rather  unlikely 

Very  unlikely 

Rather  Improbable 

Very  improbable 

Subjects.  Subjects  were  social  science  and  business 
graduate  students  at  the  University  of  North  Carolina  at  Chapel 
Hill.  They  were  recruited  by  placing  notices  in  graduate  student 


mailboxes  in  the  school  of  business  and  the  departments  of 
political  science,  economics,  library  science,  psychology,  and 
sociology.  None  of  the  subjects  had  participated  in  any  similar 
experiments  on  the  measurement  of  membership  functions.  The 
notices  described  the  general  nature  of  the  study  and  promised 
the  subjects  $25  each  for  three  sessions  of  approximately  45 
minutes  each.  Twenty  native  speakers  of  English  were  randomly 
assigned  to  Groups  1,  2,  3,  or  4.  As  explained  above  in 
conjunction  with  Table  1,  the  groups  differed  only  in  terms  of 
the  phrases  they  judged. 

General  procedure.  Subjects  were  run  individually  for  a 
practice  session  followed  by  two  data  sessions,  with  the  sessions 
scheduled  generally  one  to  two  days  apart.  The  experiment  was 
controlled  by  an  IBM  PC  with  stimuli  presented  on  a  color  monitor 
and  responses  made  with  a  joystick.  During  the  practice  session 
all  the  subjects  judged  the  probabi 1 i ty  phrases  tossup .  oood 
chance .  and  poor  chance .  whereas  during  Sessions  2  and  3  they 
judged  the  phrases  indicated  in  Table  1. 

Each  session  consisted  of  three  parts,  all  of  which  employed 
one  or  more  spinners  drawn  on  the  computer  monitor.  Part  I 
employed  one  spinner,  part  2  employed  two  spinners,  and  part 
3  employed  six  spinners,  as  described  below.  Each  spinner  was 
divided  radially  into  two  sectors,  one  red  and  the  other  yellow. 
The  subjects  were  instructed  to  imagine  a  pointer  over  each 
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spinner  that  could  be  spun  so  that  it  randomly  lands  over  either  the 
red  or  ye  I  I ow  sectors.  Thus,  without  ever  using  numbers, 
each  spinner  displayed  a  probability  o-f  the  spinner  landing  on 
ye  11 ow . 

At  the  beginning  of  the  experiment,  subjects  were  instructed 
that  the  purpose  of  the  study  was  “ to  determine  how  people  use 
and  understand  nonnumerical  probability  phrases,  such  as  good 
chance .  poor  chance .  and  tossup .  for  communicating  judgments 
about  uncertainty."  Subjects  were  told  that  there  were  no  right 
or  wrong  answers,  just  individual  judgments. 

Subjects  then  were  instructed  how  to  use  a  joystick  to  move 
an  arrow  to  the  left  or  right  on  a  straight  line  and  how  to  push 
buttons  on  the  joystick  assembly  to  register  their  responses. 

Several  practice  trials  -followed  to  provide  the  subjects 
experience  with  the  joystick  controls. 

The  subsequent  instructions  were  divided  into  three  parts, 
each  corresponding  to  a  different  part  of  the  session.  The 
instructions  for  part  1  were  intended  to  elicit  membership 
functions  for  the  same  phrases  by  a  direct  scaling  procedure 
referred  to  hereafter  as  direct  estimation  (DE>  The  instructions 
for  part  2  involved  the  assessment  of  membership  functions  of  the 
five  probability  phrases  by  the  indirect  scaling  procedure 
employed  by  Wallsten  et  al .  <1985),  referred  to  hereafter  as 
pai r-compar i son  <PC>.  The  instructions  for  part  3,  which  was 
designed  to  allow  a  comparison  between  the  DE  and  PC  procedures, 
involved  a  measurement  of  membership  functions  of  the  same  five 
probability  phrases  on  an  ordinal  scale  only.  A  rank  ordering 
<R0>  scaling  procedure  was  used  for  this  purpose.  Because  the 


results  depend  strongly  on  characteristics  of  the  experimental 
procedures,  each  part  will  now  be  described  in  more  detail. 

Par t  1 .  Five  probabi 1 i ty  phrases  were  presented  in  this  and 
the  subsequent  two  parts  (cf.  Table  1).  Associated  with  each 
c  >  v  p  **  c'h*  »b  i  1  »  t  i  es  disple  yed  *e  p»i  in*  >e 
yellow  on  a  spinner.  The  probab i 1 i t i es  associated  with  the  "low* 
terms  < improbabl e .  rather  improbable,  very  improbable,  uni i kel y. 
rather  uni i ke 1 y .  very  uni ikeiy)  were  0.05,  0.14,  0.23,  0.32, 

0.41,  and  0.50;  the  probabilities  associated  with  each  of  the 
'high'*  terms  (probabi  e .  rather  probable,  very  probable.  I  ikeiy. 
rather  1 i kel y .  very  1 ikeiy)  were  0.50,  0.59,  0.68,  0.77,  0.86, 
and  0.95;  and  the  probabilities  associated  with  the  single 
■neutral"  phrase  (possible)  were  0.32,  0.41,  0.50,  0.59,  0.68, 
0.77.  Thus,  each  membership  function  was  approximated  by  six 
pcnts  with  a  difference  of  0.09  between  adjacent  points  on  the 
probability  continuum.  Practical  considerations  determined  that 
six  points  be  used  to  approximate  what  are  essentially  continuous 
membership  functions,  and  the  previous  results  of  UJallsten  et  al  . 
(1985)  determined  the  probability  ranges  for  the  "low"  [0.05, 
0.501,  -high-  [0.50,  0.953,  and  "neutral"  [0.32,  0.771 
probabi 1 i ty  phrases. 

Each  probability  phrase  was  presented  twice.  Thus,  part  1 
consisted  of  5  phrases  by  6  probabilities  per  phrase  by  2 
replications  for  a  total  of  60  trials.  The  probability  phrase 
was  always  displayed  at  the  top  of  the  screen  and  a  spinner 
divided  into  red  and  yellow  sectors  was  drawn  below  it.  The  60 


phrase  by  probability  combinations  were  presented  in  a  random 


order . 


The  instructions  for  the  DE  procedure  read  in  part: 

"On  each  trial  in  part  1,  a  probability  phrase  will  appear 
at  the  top  of  the  computer  screen.  One  spinner  will  be 
drawn  directly  below  this  phrase.  You  are  to  indicate  how 
well  the  probability  phrase  describes  the  probability  of 
landing  on  yellow  for  the  displayed  spinner.  If  you  think 
that  the  spinner  probability  is  very  well  described  by  the 
phrase,  move  the  arrow  to  the  right.  If  the  spinner 
probabi 1 i ty  is  not  at  all  well  described  by  that  phrase  move 
the  arrow  -far  to  the  left.  The  relative  location  of  the 
arrow  on  the  line  should  correspond  to  how  well  (right)  or 
how  poorly  (left)  the  phrase  describes  the  probabi 1 i ty . “ 

The  joystick  was  used  to  move  the  arrow  on  the  screen,  and  a 

button  on  the  joystick  box  was  used  to  register  the  response  when 

the  arrow  was  suitably  placed.  The  arrow  could  be  positioned  at 

any  of  200  equally  spaced  locations  on  the  line. 

Part  2.  On  each  trial  a  single  probability  phrase  was 

written  at  the  top  of  the  screen,  and  two  spinners  displaying 

different  probabilities  of  yellow  were  drawn  below  it,  one  on  the 

left  and  one  on  the  right.  The  instructions  for  the  PC  procedure 

read  in  part: 

“In  this  part  of  the  experiment  you  are  to  move  the  arrow 
along  the  line  towards  the  spinner  which  is  best  described 
by  the  phrase  at  the  top  of  the  screen.  The  distance  you 
move  the  arrow  toward  one  of  the  spinners  should  reflect 
your  confidence  in  that  judgment.  So  if  you  think  that  the 
probability  of  yellow  on  one  of  the  spinners  is  very  much 
better  described  by  the  phrase  than  is  the  probability  of 
yellow  on  the  other  spinner,  move  the  arrow  all  the  way  to 
that  end  of  the  line.  If  the  phrase  describes  the 
probability  on  one  spinner  only  slightly  better  than  the 
other,  move  the  arrow  just  slightly  off  center." 

Each  phrase  was  associated  with  the  same  six  probabilities 

as  described  in  part  1.  However,  rather  than  presenting  each 


■i 
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phrase-probability  combination  twice  as  in  part  1,  each  phrase 
was  now  presented  once  with  each  of  the  6  x  5/2  *  15  spinner 
pairs,  for  a  total  of  75  trials.  Phrases  and  spinner  pairs  were 
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presented  in  a  random,  not  a  blocked  order. 


Part  3.  On  each  trial  a  single  probabiity  phrase  was 
pointed  at  the  top  of  the  screen  and  s  i  x  spinners  each  displaying 
a  different  probability  of  yellow  were  drawn  below  it.  The 
subject's  task  on  each  trial  was  “to  rank  order  the  six  spinners 
according  to  how  well  they  are  described  by  that  phrase.*  This 
was  done  by  moving  a  cursor  to  the  spinner  best  described  by  the 
phrase  and  then  pressing  a  button  to  register  the  response.  Once 
the  response  was  registered,  the  designated  spinner  vanished  from 
the  screen.  The  same  procedure  was  repeated  six  times  until  all 
six  probabilities  were  rank  ordered.  Each  of  the  five  phrases 
was  presented  twice. 

For  10  of  the  20  subjects  <2,  3,  2,  and  3  subjects  in  Groups 
1,  2,  3,  and  4,  respectively)  the  DE  procedure  preceded  the  PC 
procedure  in  session  2,  whereas  for  10  other  subjects  the  order 
of  parts  1  and  2  was  reversed  in  that  session.  Part  3  always 
followed  parts  1  and  2.  Sessions  2  and  3  were  identical,  except 
that  the  order  of  parts  1  and  2  were  reversed. 

Results 

This  section  begins  with  an  examination  of  the  psychometric 
properties  of  the  separate  scales  derived  from  the  DE  and  PC 
procedures,  following  which  the  membership  functions  established 
by  the  two  scaling  procedures  are  compared.  Effects  of  this 
modifiers  rather  and  very  are  examined  in  the  last  part  of  this 
sec  t i on . 

The  PC  Procedure 

To  describe  the  scaling  of  the  responses  in  part  2,  several 


terms  must  -first  be  defined.  Recall  that  the  probabilities  on 


the  left  and  right  spinners  were  changed  from  trial  to  trial  such 
that,  ignoring  order,  all  probability  pairs  were  presented  once 
according  to  a  left  side  by  right  side,  P  x  P,  factorial  design 

.  —  *-l  ,  •  •  •  , >  ,  where  the  1  ,  •  .  .  ,n./  defiott 

specific  probabilities  of  the  spinners  landing  on  yellow.  U)e 
shall  consider  the  bounded  response  line  on  the  CRT  to  extend 
from  1  on  the  left  to  0  on  the  right,  and  let  Ry(j_, j.)  denote  the 
response  when  probability  p.j  is  represented  by  the  left  spinner, 
£Lj  is  represented  by  the  right  spinner,  and  the  phrase  W  is 
displayed  above  them. 

By  entering  the  response  gyti,^)  in  cell  and 

1  ~  'n  cell  (j.,j_)  ,  an  ordering  is  induced  on  the 

factorial  design  according  to  the  degree  that  the  left  hand 
probability  is  better  described  by  the  phrase  than  is  the  right 
hand  probability.  If  this  ordering  satisfies  the  axioms  of  an 
algebraic  difference  structure  (Krantz,  Luce,  Suppes,  &  Tversky, 
1971),  a  suitable  transformation  of  the  cell  entries  can  be  used 
in  a  difference  or  a  ratio  scaling  model  to  establish  a 
membership  function  for  the  phrase  W  (Ulallsten  et  al  .  ,  1985). 

More  specifically,  let  be  a  cell  in  the  P  x  P 

factorial  design.  Denote  the  rank  ordering  between  any  pair  of 
cells  by^u,  where  the  subscript  indicates  that  the  ordering  is 
for  phrase  W.  Recall  that  the  ordering  is  induced  by  placing  an 
arrow  on  the  response  line,  so  that  the  further  to  the  left  an 
arrow  is  for  a  pair  of  probabilities  (spinners),  the  higher  in 
the  rank  ordering  is  the  pair.  Formally, 

^  P  i  *  P  j  ^  ^  P  k  *  P  i  >  '  **  i  ,  j  )  i  Ry  ( k  ,  l  )  . 
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krantz  et  *1.  (1971)  proved  that  if  the  ordered  matrix  (P  x 
— »  K.  uP  satisfies  several  plausible  axioms,  then  there  exists  a 
mapping  -from  P  into  the  real  numbers  such  that 

<Pi«P  i>  t  m  <P*.P?>  Uu<Pj>  ~  *u<P.j>  1  ttU<Pk>  ‘  “i/Pl*’ 
or,  equivalently,  such  that 

<pi»Pj>  iw  <pk  *P1  >  '**  Uy<Pj  1  Uy<pk)/Uw<p,  )  . 

These  two  equations  state  that  scale  values  can  be  assigned  to 
these  probabilities  such  that  the  rank  order  of  di f f erences  <or 
of  ratios)  in  the  assigned  values  matches  the  rank  order  of 
differences  (or  of  ratios)  in  the  degrees  to  which  the  lefthand 
and  righthand  probabilities  are  described  by  the  phrase.  The 
derived  scale  values  are  unique  up  to  a  linear  (for  the  difference 
representation)  or  a  power  (for  the  ratio  representation) 
tr ansf ormat i on .  Normalized  to  be  non-negative  with  an  arbitrary 
maximum  of  1,  these  scale  values  can  be  taken  as  the  (discrete) 
membership  function  representing  the  degree  to  which  each 
probability  belongs  to  the  vague  concept  defined  by  the 
probability  phrase. 

Computationally,  the  scale  values  under  the  difference  model 
are  obtained  by  taking  arithmetic  means  of  the  rows  of  the  n  x  n. 
matrix  D  =  <D^(i,j)>,  where 

DVJ<  i  ,  j  )  =  CRw(i,j)-03  -  Cl-fiyCi,  j>3  *  2^(1, j)-l  ,  (1) 

whereas  the  scale  values  under  the  ratio  model  are  obtained  by 
taking  the  geometric  means  of  the  rows  of  the  n  x  rj.  matrix  S  = 

<S,,j(  i  ,  j  )  )  ,  where 

S^/i,  =  Cf^(i  ,j)-03/U-^(i  ,j)3  «  8y(i,j)/Il  -By(  i  « j  >  3  <2) 

To  avoid  division  by  zero  in  the  latter  equation,  the  responses 


--UK  i  ,  j  >  =  0,  1  were  set  equal  to  0.00215  and  0.9975, 
respectively. 


VA 


For  more  details  regarding  these  two  methods  oF  scaling  see 
Wallsten  et  al  .  (1985),  and  For  a  discussion  oF  alternative  ratio 
er?l ino  orocedures  see  de  Jong  (1984),  Jensen  (1984),  Saaty 
(1977,  1 980  >  ,  Saaty  and  Vargas  (1984),  and  Williams  and  CrawFord 
( 1980) . 

It  should  be  noted  that  at  an  axiomatic  level  no  distinction 
can  be  made  between  the  diFFerence  and  ratio  representations 


unless  diFFerent  orderings  appear  under  diFFerence-  and  ratio- 
inducing  conditions  (Birnbaum,  1980?  Miyamoto,  1983).  This  is  so 
because  any  set  oF  diFFerences  can  be  mapped  into  a  set  oF  ratios 
by  taking  logarithms,  and  conversely,  any  set  oF  ratios  can  be 
mapped  into  a  set  oF  diFFerences  by  exponentiating.  However,  the 
representations  can  be  compared  to  each  other  in  terms  oF  the 
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correlations  computed  separately  For  each  model  between  the 
observed  responses  and  the  responses  predicted  From  the  derived 
scale  values  (Wallsten,  et  al . ,  1985). 


To  compute  these  correlations,  three  steps  were  taken. 

First,  we  computed  the  measures  fi^j(i,j)  and  SyFi.j)  From 
equations  (1)  and  (2)  and  subsequently  completed  the  matrices  1.' 
and  S  by  inserting  the  compl ementary  measures  £^(j,i)  *  1  - 
Dy(i,j>  and  Sy(j,i)  =  1/Sy(i,j>  and  placing  0's  and  l's  in  the 
main  diagonals  oF  D  and  S,  respectively.  Second,  the  matrices  D 
and  S,  computed  separately  For  each  subject,  session,  and  phrase, 


f 


were  each  scaled  according  to  the  diFFerence  model  and  ratio 


model,  re  spec t i ve 1 y .  Finally,  the  derived  scale  values  were  used 


to  compute  For  each  model  separately  the  predicted  responses, 
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and  the  correlations  between  R^(i,j)  and 
mer-e  then  computed  ■for  each  subject,  session,  and  phrase 
separately  (each  based  on  15  pairs  of  observations). 

The  mean  correlations  between  and  over  all 

the  20  subjects  are  shown  in  Table  2.  To  maintain  an  equal 
number  of  cases  for  each  mean,  the  phrases  or obabl e  and 
improbabl e  were  pooled  together  (column  2)  as  were  the  two 
phrases  1 i ke 1 y  and  uni i ke 1 y  (column  3> ,  the  four  phrases  modified 
by  rather  (column  4),  and  the  four  phrases  modified  by  very 
(column  5).  Each  mean  in  Table  2  is,  therefore,  based  on  20 
correlations,  one  per  subject. 

A  2  x  2  x  5,  model  (ratio,  difference)  by  session  (2,  3)  by 
probability  phrase  (possi bl e .  probabl e/improbabl e . 
like! y/un 1 i ke 1 y .  rather  (  ),  and  very  (  )>,  ANOWA  with  repeated 
measures  on  all  three  factors  was  conducted  on  the  correlations 
between  By(i,j>  and  ^1 1  three  factors  were  highly 

significant  (F  =  17.15,  g.  <  0.01,  F  =  8.86,  e.  <  0.01,  and  F  = 
4.08,  b.  (  0.01  for  model,  session,  and  phrase  type, 
respec  t  i  we  1  y )  .  The  same  ANCKjA  conducted  on  the  z,- transforms  of 
the  correlations  yielded  similar  results.  As  shown  in  Table  2, 
the  correlations  between  observed  and  predicted  responses  for  the 
difference  model  exceeded  on  the  average  those  for  the  ratio 
model?  the  correlations  in  Session  3  were  higher  on  the  average 
than  those  in  Session  2j  and  the  correlations  for  the  probability 
phrase  possi bl e  were  lower  on  the  average  than  the  correlations 
for  all  other  probability  phrases. 


The  significant  session  effect  may  be  due  either  to  changes 
in  the  shapes  of  the  memberhip  functions  from  Session  2  to  3  or 


to  fewer  response  errors  due  to  learning.  To  choose  between 
these  two  hypotheses,  the  reliability  of  the  scaled  values  was 
assessed  by  computing  the  correlation  between  the  derived  scale 
values  in  sessions  3  and  3  tor  each  subject  ana  mode i  separately 
over  the  probabi 1 i ty  phrases.  With  5  different  phrases  and  6 
scale  values  per  phrase,  each  correlation  was  based  on  30  pairs 
of  observations.  Of  the  20  correlations  computed  under  the  ratio 
model,  all  but  one  were  highly  significant  <  0.01). 

Similarly,  all  the  20  correlations  computed  under  the  difference 
model  were  highly  significant  <£  <  0.01).  The  mean  correlations 
over  subjects  were  0.75  and  0.86  for  the  ratio  and  difference 
models,  respectively,  demonstrating  high  reliability  of  the 
derived  scale  values  and  providing  additional  evidence  for  the 
superiority  of  the  difference  model  over  the  ratio  model. 

Based  on  the  test  results  reported  above,  we  decided  to  use 
the  Session  3  judgments  only,  to  take  the  scale  values  derived 
from  the  difference  rather  than  the  ratio  model  as  the  PC 
membership  functions,  and  to  analyze  the  various  membership 
functions  for  each  subject  separately. 

The  DE  Procedure 

For  each  session,  the  responses  of  the  two  replications  were 
averaged  and  the  five  membership  functions  with  six  points  per 
function  were  determined  directly  from  the  mean  responses. 
Reliability  of  the  responses  was  assessed  by  computing  the 
correlation  between  the  mean  responses  in  Sessions  2  and  3  over 
the  five  phrases  for  each  subject  separately.  As  in  the  PC 


procedure,  each  correlation  was  based  on  30  pairs  of  mean 
responses.  All  20  correlations  were  highly  significant  <£  < 
0.01).  The  mean  correlation  was  0.86,  equal  to  the  mean 
correlation  computed  for  the  difference  model  under  the  PC 
ni'nr.riiirp.  Rorj'^p  of  the  fuidence  for  sess  i  on-to-sess  i  on 
learning  reported  in  the  preceding  section  and  the  high 
correlation  of  the  responses  between  sessions,  it  was  decided  to 
take  the  scale  values  from  Session  3  as  the  DE  membership 


f unct i ons. 


The  R0  Procedure 


As  in  the  DE  procedure,  the  two  rankings  in  each  session 
were  first  averaged  and  then  five  membership  functions  (unique  up 
to  an  order  preserving  transformation)  with  six  points  per 
function  were  determined  directly  for  each  subject  and  session. 
Reliability  of  the  rankings  was  assessed  again  by  computing  rank 
order  correlations  between  the  averaged  responses  in  Sessions  2 
and  3.  All  20  correlations,  each  based  on  30  pairs  of 
observations,  were  highly  significant  <£  <  0.01).  The 
correlations  ranged  between  0.66  and  0.??  with  a  mean  of  0.86, 
which,  surprisingly,  is  exactly  equal  to  the  two  previous 
reliability  correl at i ons. 

Reasonableness  of  the  Membership  Functions 


Concluding  that  the  three  scaling  procedures  are  equally 
reliable,  we  next  turn  to  the  scale  values  to  consider  how 
reasonable  they  are  as  membership  functions.  For  this  purpose, 
the  derived  scale  values  from  Session  3  were  each  normalized  to 
have  a  maximum  at  1,  and  were  plotted  separately  for  each  phrase 


ft 


and  each  subject  as  a  function  of  the  spinner  probabilities  of 
landing  on  yellow.  The  solid  lines  in  Figures  1  through  5 
represent  the  membership  functions  elicited  by  the  DE  procedure 
ana  the  broken  lines  those  elicited  by  the  PC  procedure.  Figure 

1  cKcm.ic  nairt  r>4  in»mh»r5h  i  p  »  fn*  Xrvf-  tK#  r  OC  £  *  b  1  C  ■ 

Figure  2  portrays  ten  pairs  of  functions  for  the  phrase  probabl e 
<subjects  1  through  10)  and  ten  for  improbabl e  (subjects  11 
through  20).  The  membership  functions  for  the  pair  of  antonyms 
uni i ke 1 y  (subjects  1  through  10)  and  1 i ke 1 y  (subjects  11  through 
20)  are  shown  in  Figure  3.  Figure  4  shows  pairs  of  functions  for 
the  probabi 1 i ty  phrases  rather  uni i ke 1 y  (subjects  1  through  5), 
rather  probable  (subjects  6  through  10),  rather  improbable 
(subjects  11  through  15),  and  rather  1 i ke 1 y  (subjects  16  through 
20).  Finally,  Figure  5  displays  the  membership  functions  for  the 
phrases  very  probable  (subjects  1  through  5),  very  uni i kel y 
(subjects  6  through  10),  very  1 ikely  (subjects  11  through  15), 


and  very  improbable  (subjects  16  through  20).  Recall  that  the 
probability  values  on  the  abscissas  vary  between  probability 
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Figure  1.  Individual  membership  ■functions  for  possible.  D 
functions  are  indicated  by  solid  lines  and  PC  by 
broken  lines.  Numbers  in  the  panels  refer  to 
individual  subjects. 
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Figure  2.  Individual  membership  -functions  for  pcfifeftbl*  (subjects 
1-10)  and  improbable  (subjects  11-20).  DE  -functions 
are  indicated  by  solid  lines  and  PC  by  broken  lines. 
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Figur*  3.  Individual  membership  functions  for  llk-tl*  (subjects 

11-20)  and  uali.k.tj.¥  (subjects  1-10).  DE  functions  are 
indicated  by  solid  lines  and  PC  by  broken  lines. 
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Figure  4 
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Individual  membership  functions  for  cathai:  llkal* 
(subjects  16-20),  Lathee.  pLahable  (subjects  6-10), 
catb&c  ualik.elx  (subjects  1-5),  and  Lathac.  ixnp.c.ohab.Le 
(subjects  11-15).  DE  functions  are  indicated  by  solid 
lines  and  PC  by  broken  lines. 


Probability  Value 

Figure  5.  Individual  membership  functions  for  uecx  lifcelx 

(subjects  11-15),  U.EC.X  acababLE  (subjects  1-5),  u.ec.x 
ualxktlx  (subjects  6-10),  and  uec.x  .unp.c.QbablE 
(subjects  16-20).  DE  functions  are  indicated  by  solid 
lines  and  PC  by  broken  lines. 


Table  3  summarizes  the  types  of  membership  functions  found 


over  ail  subjects  for  low,  neutral,  and  high  probability  phrases. 
The  two  low  phrases  -uap.cnhah  la  and  un±±k.&±x  are  grouped  together 
as  are  the  two  high  phrases  pcnbahla  and  The  bottom  four 

rows  of  Table  3  show  the  latter  two  pairs  of  phrases  modified  by 
c.a±h.ec.  and  uacx.  Membership  functions  can  be  char  ac  ter  i  zed  as 
either  monotonic  increasing,  monotonic  decreasing,  single  peaked, 
or  other.  In  classifying  the  100  pairs  of  functions  we  allowed 
one  inversion  per  function  provided  its  magnitude  did  not  exceed 
0.10.  For  example,  the  PC  elicited  function  for  subject  3  in 
Figure  1  is  classified  as  "monotonic  increasing,"  whereas  the  DE 
elicited  function  for  subject  16  in  Figure  1  is  classified  as 
"other . " 


Table  3 

Classification  of  the  Membership  Functions  in  Session  3  by  Shape 


Probability  Phrase 

Mono tonic 
Increasing 

Monotonic 

Decreasing 

Single 

Peaked 

Other 

Total 

Frequency 

Possible 

6 

3 

22 

9 

40 

Probable/Likely 

17 

2 

20 

1 

40 

Improbable/Unlikely 

0 

16 

21 

3 

40 

Rather  (Prob. /Likely) 

10 

0 

6 

4 

20 

Rather  (Impr. /Unlikely) 

0 

7 

12 

1 

20 

Very  (Prob. /Likely) 

17 

0 

1 

2 

20 

Very  (Impr. /Unlikely) 


0 


19 


1 


0 


20 


The  monotonic  and  single  peaked  -functions  might  all  be 
considered  reasonable  in  terms  o-f  the  underlying  semantics, 
whereas  those  classi-fied  as  'other'  cannot  easily  be  so 
considered.  Table  3  shows  that  overall  (allowing  for  no  more 
than  nn«>  minor  inversion,  which  occurred  in  about  20%  of  all 
cases),  90%  of  the  functions  are  reasonable  by  this  criterion. 
Arguing  again  in  terms  of  the  underlying  semantics,  one  would 
expect  that  monotonic  functions  for  low  phrases  would  be 
decreasing,  whereas  monotonic  functions  for  high  phrases  would  be 
increasing.  Table  3  confirms  this  expectation  remarkably  well. 
Excluding  the  neutral  phrase  doss i bl e .  there  are  altogether  46 
monotonic  functions  for  the  high  phrases,  44  of  which  are 
increasing,  and  there  are  42  monotonic  functions  for  the  low 
phrases  all  of  which  are  decreasing. 

One  would  also  expect  that  phrases  closer  to  the  two 
extremes  of  the  probability  range  t0,13  will  tend  to  have  more 
monotonic  than  single  peaked  functions,  whereas  phrases  near  the 
middle  of  the  probability  range  will  tend  to  have  more  single 
peaked  than  monotonic  functions  (Uallsten,  et  al . ,  1985).  The 
two  modifiers  very  and  rather  can  be  represented  as  operators 
acting  on  membership  functions  (Zadeh,  1972).  Membership 
functions  of  phrases  modified  by  very  are  expected  to  be  more 
frequently  monotonic  than  functions  of  phrases  modified  by 
rather .  Tab'e  3  confirms  this  prediction,  too;  of  the  40 
functions  of  phrases  modified  by  very.  36  <90%)  are  monotonic, 
whereas  of  the  40  functions  of  phrases  modified  by  rather .  only 


17  <42.5%)  are  monotonic. 
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Cotppar  i  son  of  the  PC  and  DE  Procedures 


Concluding  that  most  o-f  the  membership  -functions  are  not 


only  stable  over  time  but  also  reasonable  and  semantically 


i  n  terpre  t  abl  e ,  we  turn  next  to  a  comparison  o-f  the  -functions 


elicited  by  the  PC  and  DE  procedures.  An  inspection  o-f  Fi oures  1 


through  5  reveals  that  in  virtually  all  cases  the  two  procedures 


yield  the  same  shape  -function.  However,  with  the  exception  of 


the  phrase  doss i bl e  in  Figure  1,  the  functions  elicited  by  the  DE 


procedure  (solid  lines)  tend  to  lie  above,  or  dominate,  the 


functions  elicited  by  the  PC  procedure  (broken  lines).  If  two 


membership  functions  A  and  B  are  defined  over  the  same  support 


and  function  A  has  larger  grades  of  membership  than  B,  then  the 


concept  giving  rise  to  function  A  is  more  fuzzy  or  vague  than  the 


one  giving  rise  to  function  B. 


The  impression  regarding  dominance  gained  from  inspecting 


Figures  1  through  5  is  substantiated  by  the  frequencies  presented 


in  Table  4.  For  the  functions  displayed  in  Figures  i  through  5, 


we  say  that  function  A  dominates  function  B  if  MA(x>  >  n„(x)  for 


at  least  five  of  the  six  probability  phrases  in  the  common 


support  of  both  functions,  with  equality  holding  only  i '  m^(x)  = 


Ur(x>  =  1.  Table  4  shows  that  the  functions  elicited  by  the  DE 


procedure  dominate  those  elicited  by  the  PC  procedure  in  52  of 


the  100  cases,  that  the  reverse  occurs  in  12  cases  only,  and  that 


in  37  cases  neither  member  of  the  pair  of  functions  dominates  the 


other.  The  null  hypothesis  of  equal  proportions  of  domination  is 


rejected  for  the  total  frequencies  over  phrases  (right-hand 


column  of  Table  4)  by  a  sign  test  (£  <  0.01).  It  is  also 


rejected  for  each  of  the  five  phrase  types  in  Table  4  except 


possible. . 
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Table  4 

Comparison  of  Membership  Functions  Elicited  by  the  DE  and  PC  Procedures 


Probability  Phrase 


Possible 


Probable/ 

Improbable 


Likely/ 

Unlikel 


Rather  (•)  Very  ( ■ 


Total 


OE  dominates 


PC  dominates 


Other 


Table  4  shows  that  the  DE  and  PC  procedures  do  not  elicit 
the  same  membership  functions  for  the  five  phrases  under 
consideration.  Although  the  functions  are  equally  stable  over 
time,  equally  reasonable,  and  of  roughly  the  same  shape,  the  ones 
elicited  by  the  DE  procedure  enhance  the  magnitude  of  the  grade 
of  membership  or,  equivalently,  the  ones  elicited  by  the  PC 
procedure  reduce  it. 

To  choose  between  the  two  procedures,  we  invoked  the 
membership  functions  elicited  by  the  RO  procedure,  which,  as  may 
be  recalled,  are  measurable  on  an  ordinal  scale  only.  For  each 
subject  separately,  a  rank  order  correlation  was  computed  between 
the  membership  functions  elicited  by  the  DE  and  RO  procedures  in 
Session  3.  The  results  were  pooled  over  the  five  phrases.  All 
20  correlations,  each  based  on  30  pairs  of  observations,  are 
highly  significant  <p  <  0.01),  and  their  mean  value  is  0.32. 
Similar  rank  order  correlations  were  then  computed  between  the 
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functions  elicited  by  the  PC  and  RO  procedures.  Again,  all  20 


correlations  are  highly  significant  (j>  <  0.01),  but  now  the  mean 


value  is  slightly  greater  at  0.88.  A  sign  test  was  used  to  test 


the  null  hypothesis  that  the  probability  is  0.5  that  a  correlation 


between  DE  and  RO  exceeds  in  magnitude  the  corresponding 


correlation  between  PC  and  RO.  In  15  of  the  20  cases  the 


correlation  between  PC  and  RO  exceeded  the  corresponding 


correlation  between  DE  and  RO,  leading  us  to  reject  the  null 


hypothesis  <£  =  0.02) 


A  second  comparison  of  the  PC  and  DE  procedures  consisted  of 


computing  the  rank  order  correlation  between  the  functions 


elicited  by  the  DE  and  PC  procedures  in  Session  3  (as  shown  in 


Figures  1  through  5>  for  each  subject  and  phrase  separately.  The 


mean  rank  order  correlation  is  0.85.  As  each  individual 


correlation  was  based  on  six  pairs  of  observations  only,  the 


criterion  for  rejecting  the  null  hypothesis  of  zero  correlation 


(at  p  =  0.05)  in  each  instance  was  set  at  0.829.  In  33  of  the 


100  cases  the  rank  order  correlation  did  not  exceed  this  very 


high  criterion.  For  each  of  these  33  cases,  a  comparison  was 


made  between  the  rank  order  correlation  between  DE  and  RO  and  the 


rank  order  correlation  between  PC  and  RO.  Of  these  33 


comparisons,  the  rank  order  correlation  between  PC  and  R0 


exceeded  the  one  between  DE  and  RO  in  24  cases,  the  reverse 


occurred  in  7  cases,  and  the  two  rank  order  correlations  were 


identical  in  2  cases.  The  null  hypothesis  of  equal  proportions 


of  cases  where  one  correlation  exceeds  the  other  was  again 


rejected  by  a  sign  test  (p  <  0.01).  Although  the  RO  procedure  is 
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not  posited  as  the  sole  criterion  -for  validating  the  ordinal 
properties  of  membership  functions,  the  results  of  the  last  two 
tests  provide  convergent  evidence  which  points  to  the  superiority 
of  the  PC  over  the  DE  procedure. 

Effects  of  very  and  rather 


A  basic  problem  in  quant i tat i ve  fuzzy  semantics  is  to  devise 
an  algorithm  for  the  computation  of  the  meaning  of  a  composite 
term  x  from  the  knowledge  of  the  meaning  of  each  of  its  atomic 
components.  Zadeh  <1972)  addressed  a  special  case  of  this 
problem  in  which  the  composite  term  x  is  of  the  form  x  -  mu,  where 
m  is  a  modifier  <e.g.,  m  «  ver y .  rather .  h i oh  1 y )  and  u  is  a 
primary  term  <e.g.,  1  i ke 1 y .  intell ioent).  The  funcamental  idea  is  that 
there  is  a  small  number  of  basic  functions  that,  in  combination, 
produce  a  wide  range  of  modifiers  for  fuzzy  predicates  (Lakoff, 

1973).  For  example,  very  typically  intensifies  the  particular 
predicate  it  modifies.  Thus,  any  attempt  to  model  the  effect  of 
very  should  decrease  the  degree  of  membership  of  those  elements 
in  the  fuzzy  set  that  represent  the  modified  term  whose  degree  of 
membership  is  less  than  one  in  the  fuzzy  set  represented  by  the 
unmodified  term.  A  reasonable  implementation  of  the  modifier 
very  is  based  on  the  unary  operator  called  concentrat i on  <CON>. 

If  the  result  of  applying  a  concentrator  to  A  is  denoted  by 
CON<A) ,  then  the  relationship  between  the  membership  functions  of 
A  and  CON<A)  may  be  given  by 

wCON<A><*>  =  UrA<x),  r  >  1  ,  x  «  ,  < 3) 

Although  Zadeh  (1972)  proposed  r.  *  2  for  very  (see  also 
Schmucker,  1984),  he  emphasized  that  his  proposed  representation 
was  intended  mainly  to  illustrate  his  approach  toward  modeling 


hedges  rather  than  to  provide  accurate  definitions.  The  exact 
t'or.Ti  of  the  -function  and  the  values  assumed  by  its  parameters  can 
only  be  determined  empirically. 

Another  unary  operation  in  fuzzy  set  theory,  which  also  has 
no  counterpart  in  ordinary  set  theory,  is  that  of  contrast 
intensification,  or  simply  i ntens i f i cat i on  (INT).  The  effect  of 
INT  is  to  increase  values  of  uA<x)  that  are  above  0.5  and  to 
diminish  those  that  are  below  this  threshold.  It  may  be  said, 
then, that  INT  heightens  the  contrast  between  the  elements  that 
are  more  than  half  in  the  set  and  those  that  are  less  than  half 


in  it  (Schmucker,  1984).  Zadeh  <1972)  proposed  a  simple  concrete 
expression  for  an  operation  of  this  type,  which  we  generalize: 

( 2r-,ur/:.<x>  ,  for  0  L  uA<x)  ±  0.5 


INT 


<x)  = 


<  4) 


1  -  2r_1<l-uA<x))r,  for  0.5  L  »A<x>  i  1 
where  r  >  1  and  x  c  U.  Although  Zadeh  restricted  himself 
unnecessarily  to  r  *  2,  other  values  of  r,  may  achieve  the  same 


desired  effect  on  uA<x> . 


The  membership  functions  of  the  four  primary  probabi 1 i ty 
phrases  that  were  modified  by  very  are  shown  in  Figure  2 
<  probabi e  and  improbabl e )  and  Figure  3  <  un 1 i ke 1 y  and  likely). 
Because  of  the  slight  superiority  of  the  PC  over  the  DE 
procedure,  we  restrict  our  attention  in  the  present  section  to  PC 
elicited  membership  functions  only.  Of  the  20  PC  elicited 
functions,  10  are  single  peaked  and  10  are  monotonic.  Inspection 
of  Figure  5,  however,  shows  that  for  the  terms  modified  by  very 
only  2  functions  (subjects  15  and  18)  are  single  peaked;  of  the 
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remaining  18  -functions,  17  are  monotonic  and  one  (subject  13)  is 


classified  as  "other."  Although  subjects  15  and  18  each  yielded 
single  peaked  -functions  for  both  the  primary  and  the  modified 
phrase,  in  neither  case  does  uery  actually  intensify  the  primary 
phrase  it  modifies.  Similarly,  when  the  primary  phrase  has  a 
single  peaked  function  and  the  modified  phrase  has  a  monotonic 
function,  uery  cannot  be  modeled  as  a  concentrator. 

We  turn  next  to  test  Zadeh's  concentration  model,  Eq .  (3), 
when  the  membership  functions  of  both  the  primary  and  modified 
phrases  are  monotonic.  To  do  so,  we  first  display  Figure  6  to 
illustrate  how  the  CON  operator  works  in  this  case.  Figure  6 
consists  of  three  sections  each  diuided  into  two  parts.  The 
solid  lines  in  the  upper  three  parts  describe  three  hypothetical 
membership  functions,  which  are  not  unlike  those  displayed  in 
Figures  2  and  3.  All  three  functions  are  monotonic  increasing 
and  share  the  same  support.  The  function  on  the  left  is  conuex 
,  the  one  in  the  middle  is  linear  =  ax  +  b)  ,  and  the 
one  on  the  right  is  concaue  *  a  1 og  x  ♦  b) .  The  broken  lines 
represent  the  membership  functions  of  the  operator  CON  with  r  = 
2.  In  each  case,  CON  diminishes  the  degree  of  membership  of  the 
elements  whose  degree  of  membership  is  less  than  one.  Similar 
functions  with  other  ualues  of  r  can  be  readily  envisioned. 
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The  three  graphs  in  the  lower  part  o f  Figure  6  display  the 
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d  |  f f ernce  .functions  uA<x>  -  UC0N(A)(x>.  The  three  difference 
functions  are  all  single  peaked;  they  assume  zero  values  at  the 
two  extremes  of  the  probability  range  <0.50  and  0.95  in  the 
example^  and  positive  values  elsewhere.  For 
monoton i cal  1 y  increasing  functions,  the  difference  function  is 
negatively  skewed  if  uA(x)  is  convex,  symmetric  if  uA<x)  is 
linear,  and  positively  skewed  if  UA<£>  is  concave.  For 
monoton i cal  1 y  decreasing  functions,  positively  and  negatively 
skewed  functions  are  interchanged.  If  very  operates  as  a 
concentrator  on  monotonic  functions,  then  the  observed  difference 
functions  should  have  similar  shpaes  to  those  displayed  in  Figure 
6.  A  very  similar  approach  has  been  developed  by  Yager  (1982). 

Experimental  evidence  concerning  the  effects  of  very  has  not 
supported  Zadeh's  conjecture.  Hersh  and  Caramazza  (1976)  and 
MacVi car-Whel an  (1978)  suggested  instead  that  the  function  shifts 
by  a  fixed  constant  (see  also  Lakoff,  1973).  In  other  words, 
rather  than  modeling  very  by  equation  (3),  they  proposed  to  model 
it  by  UA<x+c) ,  where  c  is  positive  if  the  membership  function  of 
the  primary  predicate  is  monotonic  increasing  and  negative  if  it 
is  monotonic  decreasing.  It  follows  from  the  latter  model  that 
the  difference  function  should  have  a  zero  slope  for  all  monotonic 
functions,  a  positive  intercept  for  monotonic  increasing 
functions,  and  a  negative  intercept  for  monotonic  decreasing 
functions. 

Empirical  functions,  Uw<x.)  -  **very  w<x>,  were  computed 
separately  for  each  of  the  ten  subjects  who  exhibited  monotonic 
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membership  ■functions  for  probabl  e  .  improbabl  e  .  uni  i  kel  y.  and 
1  i  k  e  1  >• .  Figure  7  displays  these  ten  difference  functions. 

Because  the  membership  functions  are  normalized  to  the  same 
maximum,  the  right-hand  ordinates  of  the  top  five  figures  and  the 
lett-nana  orainaces  of  the  bottom  five  tigures  muse  in  general  oe 
equal  to  zero.  The  other  extreme  points  are  not  so  constrained. 
Seven  of  the  ten  difference  functions  (subjects  1,  5,  7,  12,  14 
16,  and  20)  in  Figure  7  are  seen  to  be  in  good  agreement  with 
Zadeh's  concentration  model.  Three  other  functions  (subjects  3, 
10, and  17)  support  neither  of  the  two  competing  models. 

The  membership  functions  of  the  four  probability  phrases 
modified  by  rather  are  shown  in  Figure  2  (probabl e  and 
i mprobabl e )  and  Figure  3  (uni i ke 1 y  and  1 i kel y) .  Of  the  20  PC 
elicited  functions  for  the  primary  phrases,  10  were  classified  as 
single  peaked,  9  as  monotonic,  and  1  (subject  6  in  Figure  2)  as 
•other.*  Empirical  difference  functions,  uw(x_>  -  Urather  y(x> 
were  computed  for  all  19  subjects  with  single  peaked  or  monotonic 
functions.  Figure  8  shows  the  difference  functions  for  "single 
peaked"  subjects,  and  Figure  9  shows  the  ones  for  "monotonic" 
subjects. 
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Figure  9.  Empirical  difference  functions,  '  “rather  W<2- 

for  the  9  subjects  with  monotonic  Uy<x.>  • 


Two  models  were  tested  -for  the  effect  of  rather  on 
probability  phrases.  The  first  model  contends  that  rather 
increases  the  fuzziness  of  the  elements  of  the  fuzzy  set  that 
represents  the  modified  term.  This  effect  may  be  modeled  by 
another  unary  operation  called  d i 1  at i on  <DIL>,  which  has  the 
opposite  effect  of  concentration  (Zadeh,  1972).  The  relationship 
between  the  membership  functions  of  A  and  DIL<A>  is  given  again 
by  equation  (3)  with  r  <  1  (Zadeh  proposed  r  =  1/2).  The  second 
model  that  we  tested  is  due  to  Lakoff  (1973),  who  suggested  that 
the  effect  of  rather  be  modeled  by  the  compound  operation 
lNT(CON(x)),  where  CON  is  defined  by  equation  (3)  and  INT  by 
equation  ( 4) . 

We  computed  and  studied  the  difference  functions  for  DIL(x), 

uW<x.>  “  wdi L(W)  *  *  anc*  *or  INT(C0N< x>  >  ,  Uy<x>  -  Uint(CON(W)  )  *  * 

using  several  hypothetical  single  peaked  and  monotonic  functions. 

Several  tests  that  we  performed  on  the  observed  difference 
functions,  the  details  of  which  are  omitted  here,  failed  to 
corroborate  either  of  the  two  models.  Inspection  of  Figures  8 
and  9  does  not  reveal  any  consistent  patterns,  suggesting, 
perhaps, that  the  use  of  rather  to  modify  probability  phrases  is 
highly  i d i osyncr at i c . 

Di scuss i on 

The  organization  of  this  section  is  as  follows:  We  first 
compare  the  present  results  to  those  obtained  by  Wallsten  et  al . 
<1985).  This  is  followed  by  a  comparison  of  the  PC  and  DE 
procedures.  Next,  the  effects  of  the  modifiers  are  considered. 
Finally,  we  return  to  the  issue  of  scale  type. 


employed  the  graded  PC  procedure  in  two  experiments  to  establish 
membership  -functions  for  probability  phrases  similar  to  those 
used  here.  Judgments  were  highly  stable  over  two  sessions  and 
In*  auul  live  difference  axioms  were  satisfied  to  a  >  !.  iyf. 
degree.  Furthermore,  the  derived  membership  functions 
representing  individuals'  understandings  of  a  given  phrase, 
varied  greatly  from  one  person  to  another,  but  were  roughly 
constant  over  time  for  each  person.  Finally,  the  shapes  of  the 
functions  were  generally  semantically  reasonable,  in  that  they 
were  primarily  single  peaked  or  monotonic,  with  single  peaked 
functions  predominating  for  phrases  towards  the  center  of  the 
[0,1]  interval,  monoton i cal  1 y  decreasing  functions  predominating 
near  the  low  end,  and  monoton i cal  1 y  increasing  functions 
predominating  near  the  high  end  of  the  interval.  Because  the 
additive-difference  axioms  were  so  well  satisfied  in  the  previous 
study,  we  did  not  check  them  with  the  PC  judgments  in  the  present 
experiment.  However,  in  all  other  respects  the  present  results 
duplicate  those  of  the  prior  study. 

There  is  one  illuminating  difference  between  the  present  and 
preceding  PC  results.  As  already  indicated,  one  cannot 
distinguish  on  ordinal  grounds  whether  subjects  are  making  ratio 
or  difference  judgments.  In  the  preceding  study  it  was  also  not 
possible  to  distinguish  the  two  kinds  of  judgments  on  numerical 
grounds,  because  the  two  types  of  scaling  models  tended  to 
describe  the  judgments  equally  well.  This  was  not  the  case  with 
the  present  data,  however.  In  the  previous  experiments,  the 
probability  values  judged  for  each  phrase  were  quite  closely 
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spaced,  whereas  in  this  study  they  were  separated  by  0.09.  As  a 


result,  there  was  more  opportunity  ■for  the  ratio  and  difference 


models  to  differentially  fit  the  data,  and  they  did.  Uniformly, 


the  difference  model  out-performed  the  ratio  model,  suggesting 


that  in  the  PC  situation  subjects  were  judging  differences  rather 


than  ratios.  The  same  conclusion  has  been  reached  in  other 


measurement  contexts  on  the  basis  of  other  experimental 


manipulations  <Birnbaum,  1980). 


The  importance  of  the  extensive  individual  differences  in 


the  present  and  prior  results  should  not  be  minimized.  It  is 


clear  that  averging  over  subjects  would  have  been  meaningless. 


Furthermore,  these  results  should  be  troublesome  to  designers  of 


expert  systems  who  wish  to  incorporate  membership  functions 


representing  the  "meanings*  of  particular  fuzzy  terms. 


Comparing  the 


ires .  On  theoretical  grounds 


alone,  the  PC  procedure  is  more  justifiable  than  the  OE 


procedure.  First,  the  PC  procedure  requires  comparative  rather 


than  absolute  judgments,  which  generally  are  easier  for  people  to 


make.  Second,  the  nature  of  the  paradigm  imposes  sufficient 


structure  on  the  data  that  it  is  possible  to  reject  a  set  of 


judgments  as  being  inconsistent  with  the  underlying  model. 


Additional  forms  of  validation  such  as  consideration  of  the 


shapes  of  the  membership  functions,  or  use  of  the  membership 


functions  to  predict  independent  sets  of  judgments,  are 


available,  of  course,  for  both  procedures.  Conversely,  on 


pragmatic  grounds  the  OE  procedure  is  preferable  to  the  PC 


because  it  requires  so  many  fewer  judgments. 


L 


*  M  Mm  »  v*,r 4  V* 


* 


As  pointed  out  earlier,  a  strong  form  of  joint  validation 
occurs  when  the  two  procedures  give  rise  to  the  same  membership 


functions.  Ideally,  one  would  like  the  resulting  functions  to  be 
independent  of  the  method  of  measurement,  and  when  this  occurs 
confidence  is  increased  that  the  two  methods  are  representing  the 
same  underlying  construct. 

In  fact,  although  the  two  methods  yielded  similar  functions 
in  virtually  all  cases,  they  rarely  yielded  identical  functions. 
The  functions  did  not  differ  systematically  in  the  case  of 
possi bl e .  but  for  the  other  phrases  the  DE  functions  generally 
dominated  the  PC  ones.  In  other  words,  the  DE  functions 
suggested  that  the  phrases  were  more  fuzzy  than  did  the  PC 
functions. 

Generally  when  the  PC  and  DE  functions  differed,  the  PC 
function  agreed  more  strongly  with  the  rank  ordering  <R0>  results 
than  did  the  DE.  Because  it  involves  only  comparison  judgments 
without  quantitative  evaluations,  it  is  reasonable  to  think  that 
the  RO  procedure  is  the  simplest  of  the  three,  and  therefore  that 
its  results  come  closest  to  representing  the  underlying  ordinal 
characteristics  of  the  meanings  of  the  phrases.  To  the  degree 
that  this  assumption  about  the  RO  character i st i cs  is  accepted,  it 
can  be  said  that  the  PC  procedure  yields  more  veridical 
representations  than  does  the  DE  procedure. 

These  results  can  be  considered  from  both  theoretical  and 
practical  perspectives.  Considering  the  theoretical  issues 
first,  UJallsten  <1974)  pointed  out  that  unlike  measurement  in 
classical  physics,  the  quantities  being  measured  are  not 
independent  of  the  instruments  doing  the  measuring.  Thus,  for 


example,  one  may  compare  weights  on  a  pan  balance,  secure  in  the 
Knowledge  that  (unlike  in  quantum  physics)  the  act  of  measuring 
does  not  affect  the  weights.  However,  when  two  degrees  o-f 
membership  are  compared,  those  degrees  are  internal  to  the  same 
organism  doing  the  comparison,  and  there-fore  one  cannot  consider 
the  quantities  being  measured  independently  o-f  the  method  o-f 
measurement.  Thus,  one  possibility  is  that  the  meaning  of  a 
vague  concept  in  a  particular  situation  to  a  particular 
individual  actually  changes  with  the  method  of  interrogation.  A 
second  interpretation  is  that  the  meaning  stays  relatively  fixed 
over  measurement  procedures,  but  some  procedures  lead  to  less 
distortion  than  others.  According  to  this  second  view, 
comparative  non-quan t i tat i ve  judgments  are  the  simplest  to  make, 
and  therefore  the  most  accurate  (although  frequently  the  least 
informative),  while  quantitative  judgments  involve  comparison 
against  an  internal  number  scale  which  introduces  additional 
measurement  error  and  therefore  judgments  that  are  somewhat  more 
similar  to  each  other.  These  two  views  cannot  be  distinguished 
on  the  basis  of  the  present  data,  but  the  latter  seems  more 
sensible  to  us. 

On  practical  grounds,  the  outcomes  of  the  two  procedures  are 
not  so  different.  Thus,  the  less  taxing  DE  method  may  frequently 
yield  measures  that  are  sufficiently  good  for  a  particular 
purpose . 

Effects  of  modifiers.  It  is  frequently  difficult  to  compare 
which  of  two  or  more  functions  fit  errorful  data,  particularly 
when  the  functions  include  free  parameters.  The  comparison  is 


easiest  when  it  can  be  done  on  the  basis  of  qualitative  rather 
than  statistical  characteristics  o-f  the  data.  This  is  precisely 
what  we  did  in  testing  models  -for  the  e-f-fects  o-f  very  and  rather 
< c-f  .  F  i  gure s  6- 9>  . 

Considering  rather  -first,  the  two  models  tested  above  are 
both  inconsistent  with  the  responses  o-f  our  subjects. 

Furthermore,  as  the  difference  functions  for  rather  show,  the 
modifier  has  no  consistent  effect  over  subjects.  It  is  possible 
that  the  term  is  completely  empty  and  provides  no  modifying 
effect  at  all,  but  it  is  rather  more  likely  that  the  term  has 
differential  effects  for  different  people,  depending  on  their 
linguistic  background.  Thus,  for  some  people  rather  might  serve 
as  a  hedge,  i.e.,  move  the  meaning  of  the  phrase  more  toward  the 
center  of  the  10,11  interval,  while  for  other  people  it  serves  as 
an  i n tens i f i er  ,  and  for  a  third  group  of  people  it  has,  in  fact, 
no  real  meaning.  If  this  is  the  case,  then  the  modifier  rather 
should  be  eschewed  in  the  development  of  systems  for  specific 
applied  purposes. 

The  effect  of  very .  however,  is  clear-.  Not  surpr  i  s  i  ngl  y , 
very  acts  as  an  intensifier.  It  is  particularly  interesting  that 
18  of  the  20  membership  functions  for  phrases  including  very  were 
monotonic,  despite  the  fact  that  10  of  the  functions  for  the  base 
term  were  single  peaked.  Thus,  very  not  only  intensifies,  but  it 
also  eliminates  any  hedging  quality  to  the  meaning  of  the  phrase 
(Uallsten  et  al . ,  1985,  distinguished  between  concepts  with 
hedged  and  unhedged  meanings  in  that  the  former  had  membership 
values  of  zero  at  both  the  upper  and  lower  boundaries  of  its 
domain  of  support,  while  the  latter  had  membership  function 


values  of  zero  only  at  one  or  the  other  boundary.) 

No  available  descriptive  model  for  the  effects  of  very  can 
handle  this  full  range  of  results.  However,  when  attention  is 
restricted  only  to  those  phrases  for  which  the  membership 
function  of  the  unmodified  term  was  itself  monotonic,  i.e.,  the 
term  already  had  an  unhedged  meaning  to  the  subject,  then  Zadeh's 
<1972)  concentration  model  worked  well  in  7  of  the  10  cases. 

There  was  no  indication  whatsoever  that  very  was  better 
represented  by  a  shift  than  by  a  concentrat i on  function. 

Scale  of  measurement.  As  already  indicated,  we  can  consider 
the  minimum  scale  of  measurement  guaranteed  by  our  procedures, 
and  then  whether  assumptions  are  justified  or  required  to  yield  a 
yet  stronger  measurement  scale.  The  PC  procedure,  based  on  the 
additive  differnce  axioms,  guarantees  interval  scale  measurement. 
Because  of  the  lack  of  internal  structure  in  the  DE  procedure, 
its  scale  properties  are  not  so  easily  derivable.  In  order  to 
claim  ratio  scale  measurement  we  would  need  empirical  grounds  for 
establishing  membership  functions  that  are  truly  zero.  While 
such  procedures  may  be  possible  in  a  different  study,  they  were 
not  invoked  here,  and  therefore  we  have  no  basis  for  such  a 
claim.  However,  our  comparisons  of  the  PC  and  the  DE  procedures, 
except  for  the  correlations,  as  well  as  our  examination  of  the 
effects  of  modifiers,  implicitly  involved  an  additional 
assumption  regarding  the  scale  properties.  Namely,  for  each 
subject  we  assumed  that  both  the  DE  and  the  PC  functions,  as  well 
as  functions  for  different  phrases,  were  all  unique  up  to  common 
linear  transformations.  Although  this  assumption  remains 


untestable,  it  yields  a  pattern  o-f  results  that  are  sensible  and 
i n terpr e tabl e ,  and  on  these  pragmatic  grounds  the  assumption  may 
be  considered  reasonable. 
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